{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "1bc8LrmzTp4X"
   },
   "source": [
    "# 6-4 使用多GPU训练模型\n",
    "\n",
    "如果使用多GPU训练模型，推荐使用内置fit方法，较为方便，仅需添加2行代码。\n",
    "\n",
    "在Colab笔记本中：修改->笔记本设置->硬件加速器 中选择 GPU\n",
    "\n",
    "注：以下代码只能在Colab 上才能正确执行。\n",
    "\n",
    "可通过以下colab链接测试效果《tf_多GPU》：\n",
    "\n",
    "https://colab.research.google.com/drive/1j2kp_t0S_cofExSN7IyJ4QtMscbVlXU-"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "MirroredStrategy过程简介：\n",
    "\n",
    "* 训练开始前，该策略在所有 N 个计算设备上均各复制一份完整的模型；\n",
    "* 每次训练传入一个批次的数据时，将数据分成 N 份，分别传入 N 个计算设备（即数据并行）；\n",
    "* N 个计算设备使用本地变量（镜像变量）分别计算自己所获得的部分数据的梯度；\n",
    "* 使用分布式计算的 All-reduce 操作，在计算设备间高效交换梯度数据并进行求和，使得最终每个设备都有了所有设备的梯度之和；\n",
    "* 使用梯度求和的结果更新本地变量（镜像变量）；\n",
    "* 当所有设备均更新本地变量后，进行下一轮训练（即该并行策略是同步的）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "id": "jD03kaV-TcqS",
    "outputId": "800c0223-a08a-443d-f866-4afa3a55b20d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TensorFlow 2.x selected.\n"
     ]
    }
   ],
   "source": [
    "%tensorflow_version 2.x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "id": "lejRTJyuTzIN",
    "outputId": "d27c2d02-aaa0-4858-849b-538317a6887f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.1.0\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "print(tf.__version__)\n",
    "from tensorflow.keras import * \n",
    "\n",
    "#打印时间分割线\n",
    "@tf.function\n",
    "def printbar():\n",
    "    ts = tf.timestamp()\n",
    "    today_ts = ts%(24*60*60)\n",
    "\n",
    "    hour = tf.cast(today_ts//3600+8,tf.int32)%tf.constant(24)\n",
    "    minite = tf.cast((today_ts%3600)//60,tf.int32)\n",
    "    second = tf.cast(tf.floor(today_ts%60),tf.int32)\n",
    "    \n",
    "    def timeformat(m):\n",
    "        if tf.strings.length(tf.strings.format(\"{}\",m))==1:\n",
    "            return(tf.strings.format(\"0{}\",m))\n",
    "        else:\n",
    "            return(tf.strings.format(\"{}\",m))\n",
    "    \n",
    "    timestring = tf.strings.join([timeformat(hour),timeformat(minite),\n",
    "                timeformat(second)],separator = \":\")\n",
    "    tf.print(\"==========\"*8,end = \"\")\n",
    "    tf.print(timestring)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "id": "YwPhga6iTzhl",
    "outputId": "41b1f8d2-df2e-419e-8d45-5caf7e977f7b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 Physical GPU, 2 Logical GPUs\n"
     ]
    }
   ],
   "source": [
    "#此处在colab上使用1个GPU模拟出两个逻辑GPU进行多GPU训练\n",
    "gpus = tf.config.experimental.list_physical_devices('GPU')\n",
    "if gpus:\n",
    "  # 设置两个逻辑GPU模拟多GPU训练\n",
    "  try:\n",
    "    tf.config.experimental.set_virtual_device_configuration(gpus[0],\n",
    "        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),\n",
    "         tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])\n",
    "    logical_gpus = tf.config.experimental.list_logical_devices('GPU')\n",
    "    print(len(gpus), \"Physical GPU,\", len(logical_gpus), \"Logical GPUs\")\n",
    "  except RuntimeError as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Dr21VRFkUHC1"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wDT0vnEmUwng"
   },
   "source": [
    "### 一、准备数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "Dv45ARjlUG49",
    "outputId": "3a1f809a-07b0-46f6-9261-efa65444dab2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz\n",
      "2113536/2110848 [==============================] - 0s 0us/step\n"
     ]
    }
   ],
   "source": [
    "MAX_LEN = 300\n",
    "BATCH_SIZE = 32\n",
    "(x_train,y_train),(x_test,y_test) = datasets.reuters.load_data()\n",
    "x_train = preprocessing.sequence.pad_sequences(x_train,maxlen=MAX_LEN)\n",
    "x_test = preprocessing.sequence.pad_sequences(x_test,maxlen=MAX_LEN)\n",
    "\n",
    "MAX_WORDS = x_train.max()+1\n",
    "CAT_NUM = y_train.max()+1\n",
    "\n",
    "ds_train = tf.data.Dataset.from_tensor_slices((x_train,y_train)) \\\n",
    "          .shuffle(buffer_size = 1000).batch(BATCH_SIZE) \\\n",
    "          .prefetch(tf.data.experimental.AUTOTUNE).cache()\n",
    "   \n",
    "ds_test = tf.data.Dataset.from_tensor_slices((x_test,y_test)) \\\n",
    "          .shuffle(buffer_size = 1000).batch(BATCH_SIZE) \\\n",
    "          .prefetch(tf.data.experimental.AUTOTUNE).cache()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "xZUUOe58U0RD"
   },
   "source": [
    "### 二、定义模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "m8iPgdN2UGu1"
   },
   "outputs": [],
   "source": [
    "tf.keras.backend.clear_session()\n",
    "def create_model():\n",
    "    \n",
    "    model = models.Sequential()\n",
    "\n",
    "    model.add(layers.Embedding(MAX_WORDS,7,input_length=MAX_LEN))\n",
    "    model.add(layers.Conv1D(filters = 64,kernel_size = 5,activation = \"relu\"))\n",
    "    model.add(layers.MaxPool1D(2))\n",
    "    model.add(layers.Conv1D(filters = 32,kernel_size = 3,activation = \"relu\"))\n",
    "    model.add(layers.MaxPool1D(2))\n",
    "    model.add(layers.Flatten())\n",
    "    model.add(layers.Dense(CAT_NUM,activation = \"softmax\"))\n",
    "    return(model)\n",
    "\n",
    "def compile_model(model):\n",
    "    model.compile(optimizer=optimizers.Nadam(),\n",
    "                loss=losses.SparseCategoricalCrossentropy(),\n",
    "                metrics=[metrics.SparseCategoricalAccuracy(),metrics.SparseTopKCategoricalAccuracy(5)]) \n",
    "    return(model)\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "eT2U6TYbU5k1"
   },
   "source": [
    "### 三，训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "id": "-FF9lozBURfE",
    "outputId": "2e613834-a13c-45ab-e82e-351e651550c5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:NCCL is not supported when using virtual GPUs, fallingback to reduction to one device\n",
      "INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')\n",
      "Model: \"sequential\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "embedding (Embedding)        (None, 300, 7)            216874    \n",
      "_________________________________________________________________\n",
      "conv1d (Conv1D)              (None, 296, 64)           2304      \n",
      "_________________________________________________________________\n",
      "max_pooling1d (MaxPooling1D) (None, 148, 64)           0         \n",
      "_________________________________________________________________\n",
      "conv1d_1 (Conv1D)            (None, 146, 32)           6176      \n",
      "_________________________________________________________________\n",
      "max_pooling1d_1 (MaxPooling1 (None, 73, 32)            0         \n",
      "_________________________________________________________________\n",
      "flatten (Flatten)            (None, 2336)              0         \n",
      "_________________________________________________________________\n",
      "dense (Dense)                (None, 46)                107502    \n",
      "=================================================================\n",
      "Total params: 332,856\n",
      "Trainable params: 332,856\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "Train for 281 steps, validate for 71 steps\n",
      "Epoch 1/10\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').\n",
      "281/281 [==============================] - 15s 53ms/step - loss: 2.0270 - sparse_categorical_accuracy: 0.4653 - sparse_top_k_categorical_accuracy: 0.7481 - val_loss: 1.7517 - val_sparse_categorical_accuracy: 0.5481 - val_sparse_top_k_categorical_accuracy: 0.7578\n",
      "Epoch 2/10\n",
      "281/281 [==============================] - 4s 14ms/step - loss: 1.5206 - sparse_categorical_accuracy: 0.6045 - sparse_top_k_categorical_accuracy: 0.7938 - val_loss: 1.5715 - val_sparse_categorical_accuracy: 0.5993 - val_sparse_top_k_categorical_accuracy: 0.7983\n",
      "Epoch 3/10\n",
      "281/281 [==============================] - 4s 14ms/step - loss: 1.2178 - sparse_categorical_accuracy: 0.6843 - sparse_top_k_categorical_accuracy: 0.8547 - val_loss: 1.5232 - val_sparse_categorical_accuracy: 0.6327 - val_sparse_top_k_categorical_accuracy: 0.8112\n",
      "Epoch 4/10\n",
      "281/281 [==============================] - 4s 13ms/step - loss: 0.9127 - sparse_categorical_accuracy: 0.7648 - sparse_top_k_categorical_accuracy: 0.9113 - val_loss: 1.6527 - val_sparse_categorical_accuracy: 0.6296 - val_sparse_top_k_categorical_accuracy: 0.8201\n",
      "Epoch 5/10\n",
      "281/281 [==============================] - 4s 14ms/step - loss: 0.6606 - sparse_categorical_accuracy: 0.8321 - sparse_top_k_categorical_accuracy: 0.9525 - val_loss: 1.8791 - val_sparse_categorical_accuracy: 0.6158 - val_sparse_top_k_categorical_accuracy: 0.8219\n",
      "Epoch 6/10\n",
      "281/281 [==============================] - 4s 14ms/step - loss: 0.4919 - sparse_categorical_accuracy: 0.8799 - sparse_top_k_categorical_accuracy: 0.9725 - val_loss: 2.1282 - val_sparse_categorical_accuracy: 0.6037 - val_sparse_top_k_categorical_accuracy: 0.8112\n",
      "Epoch 7/10\n",
      "281/281 [==============================] - 4s 14ms/step - loss: 0.3947 - sparse_categorical_accuracy: 0.9051 - sparse_top_k_categorical_accuracy: 0.9814 - val_loss: 2.3033 - val_sparse_categorical_accuracy: 0.6046 - val_sparse_top_k_categorical_accuracy: 0.8094\n",
      "Epoch 8/10\n",
      "281/281 [==============================] - 4s 14ms/step - loss: 0.3335 - sparse_categorical_accuracy: 0.9207 - sparse_top_k_categorical_accuracy: 0.9863 - val_loss: 2.4255 - val_sparse_categorical_accuracy: 0.5993 - val_sparse_top_k_categorical_accuracy: 0.8099\n",
      "Epoch 9/10\n",
      "281/281 [==============================] - 4s 14ms/step - loss: 0.2919 - sparse_categorical_accuracy: 0.9304 - sparse_top_k_categorical_accuracy: 0.9911 - val_loss: 2.5571 - val_sparse_categorical_accuracy: 0.6020 - val_sparse_top_k_categorical_accuracy: 0.8126\n",
      "Epoch 10/10\n",
      "281/281 [==============================] - 4s 14ms/step - loss: 0.2617 - sparse_categorical_accuracy: 0.9342 - sparse_top_k_categorical_accuracy: 0.9937 - val_loss: 2.6700 - val_sparse_categorical_accuracy: 0.6077 - val_sparse_top_k_categorical_accuracy: 0.8148\n",
      "CPU times: user 1min 2s, sys: 8.59 s, total: 1min 10s\n",
      "Wall time: 58.5 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# 增加以下两行代码\n",
    "strategy = tf.distribute.MirroredStrategy()  \n",
    "with strategy.scope():\n",
    "    model = create_model()\n",
    "    model.summary()\n",
    "    model = compile_model(model)\n",
    "\n",
    "history = model.fit(ds_train,validation_data = ds_test,epochs = 10)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "9BcJej7fVZja"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "tf_多GPU.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
